Search CORE

186 research outputs found

Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets

Author: Girvan Michelle
Glass Kimberly
Publication venue
Publication date: 03/05/2013
Field of study

Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets

arXiv.org e-Print Archive

Harvard University - DASH

High Performance Computing of Gene Regulatory Networks using a Message-Passing Model

Author: Glass Kimberly
Kepner Jeremy
Quackenbush John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/07/2015
Field of study

Gene regulatory network reconstruction is a fundamental problem in computational biology. We recently developed an algorithm, called PANDA (Passing Attributes Between Networks for Data Assimilation), that integrates multiple sources of 'omics data and estimates regulatory network models. This approach was initially implemented in the C++ programming language and has since been applied to a number of biological systems. In our current research we are beginning to expand the algorithm to incorporate larger and most diverse data-sets, to reconstruct networks that contain increasing numbers of elements, and to build not only single network models, but sets of networks. In order to accomplish these "Big Data" applications, it has become critical that we increase the computational efficiency of the PANDA implementation. In this paper we show how to recast PANDA's similarity equations as matrix operations. This allows us to implement a highly readable version of the algorithm using the MATLAB/Octave programming language. We find that the resulting M-code much shorter (103 compared to 1128 lines) and more easily modifiable for potential future applications. The new implementation also runs significantly faster, with increasing efficiency as the network models increase in size. Tests comparing the C-code and M-code versions of PANDA demonstrate that this speed-up is on the order of 20-80 times faster for networks of similar dimensions to those we find in current biological applications

arXiv.org e-Print Archive

Crossref

Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks

Author: Glass Kimberly
Publication venue
Publication date: 01/01/2010
Field of study

Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function. We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation. In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes

Digital Repository at the University of Maryland

Recommended from our members

Parental attachment as a predictor of sexual, physical, and emotional abuse revictimization

Author: Glass Kimberly Lynn
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2006
Field of study

Explores why revictimization occurs in women who were sexually abused as children. Examines variables such as nature and severity of childhood abuse, attachment, and self-esteem to identify predictors of repeated abuse. A correlational-regression approach was used to test the hypothesis that lower positive attachment to parental figures, mediated by low self-esteem, will be associated with revictimization in adulthood. Approximately 150 women (Age = 18 to 54; M = 27) from various communities across Southern California participated in the study. Results did not support the hypothesis. Though self-esteem was correlated with both attachment and revictimization individually, there was no mediational effect of self-esteem between parental attachment and revictimization

CSUSB ScholarWorks

Estimating sample-specific regulatory networks

Author: Glass Kimberly
Kuijjer Marieke Lydia
Quackenbush John
Tung Matthew
Yuan GuoCheng
Publication venue
Publication date: 28/06/2018
Field of study

Biological systems are driven by intricate interactions among the complex array of molecules that comprise the cell. Many methods have been developed to reconstruct network models of those interactions. These methods often draw on large numbers of samples with measured gene expression profiles to infer connections between genes (or gene products). The result is an aggregate network model representing a single estimate for the likelihood of each interaction, or "edge," in the network. While informative, aggregate models fail to capture the heterogeneity that is represented in any population. Here we propose a method to reverse engineer sample-specific networks from aggregate network models. We demonstrate the accuracy and applicability of our approach in several data sets, including simulated data, microarray expression data from synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid cell lines. We show that these sample-specific networks can be used to study changes in network topology across time and to characterize shifts in gene regulation that may not be apparent in expression data. We believe the ability to generate sample-specific networks will greatly facilitate the application of network methods to the increasingly large, complex, and heterogeneous multi-omic data sets that are currently being generated, and ultimately support the emerging field of precision network medicine

arXiv.org e-Print Archive

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Passing Messages between Biological Networks to Refine Predicted Interactions

Author: Glass Kimberly
Huttenhower Curtis
Quackenbush John
Yuan Guo-Cheng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net

CiteSeerX

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Recommended from our members

Combinatorial Recruitment of CREB, C/EBPβ and c-Jun Determines Activation of Promoters upon Keratinocyte Differentiation

Author: Bhattacharya Paramita
Chatterjee Raghunath
Glass Kimberly
Rozenberg Julian M.
Vinson Charles
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/11/2013
Field of study

Background: Transcription factors CREB, C/EBPβ and Jun regulate genes involved in keratinocyte proliferation and differentiation. We questioned if specific combinations of CREB, C/EBPβ and c-Jun bound to promoters correlate with RNA polymerase II binding, mRNA transcript levels and methylation of promoters in proliferating and differentiating keratinocytes. Results: Induction of mRNA and RNA polymerase II by differentiation is highest when promoters are bound by C/EBP β alone, C/EBPβ together with c-Jun, or by CREB, C/EBPβ and c-Jun, although in this case CREB binds with low affinity. In contrast, RNA polymerase II binding and mRNA levels change the least upon differentiation when promoters are bound by CREB either alone or in combination with C/EBPβ or c-Jun. Notably, promoters bound by CREB have relatively high levels of RNA polymerase II binding irrespective of differentiation. Inhibition of C/EBPβ or c-Jun preferentially represses mRNA when gene promoters are bound by corresponding transcription factors and not CREB. Methylated promoters have relatively low CREB binding and, accordingly, those which are bound by C/EBPβ are induced by differentiation irrespective of CREB. Composite “Half and Half” consensus motifs and co localizing consensus DNA binding motifs are overrepresented in promoters bound by the combination of corresponding transcription factors. Conclusion: Correlational and functional data describes combinatorial mechanisms regulating the activation of promoters. Colocalization of C/EBPβ and c-Jun on promoters without strong CREB binding determines high probability of activation upon keratinocyte differentiation

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

TFutils : data structures for transcription factor bioinformatics

Author: Carey Vincent
Everaert Celine
Glass Kimberly
Gopaulakrishnan Shweta
Pochet Nathalie
Raby Benjamin
Stubbs Benjamin J
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography